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I. 



INTRODUCTION 



One of the most important tests in statistical 
applications arises in testing hypotheses about the 
distribution of a population. Before specifying a model, 
usually we should look at the data to see if they appear 
to have come from the distribution which we expect to use 
for the model. This can be approached through the histogram, 
which gives us information about the density function of 
the underlying distribution. Another approach is the sample 
distribution function, which gives us an estimate of the 
underlying cumulative distribution function. 

V/e call a test "a test of goodness-of-f i t” if the test 
is concerned with the agreement between the distribution 
of a set of sample values and some theoretical distribution, 
Much work has been devoted to finding test statistics 
whose distributions do not depend on parameters in the 
distribution of the underlying population. Such tests are 
commonly called distribution-free tests. 

One of the most well-known and useful goodness-of-f it 
tests is the Kolmogorov-Smirnov Goodness-of-f it tests , 

The Kolmogorov-Smirnov test treats individual observations 
separately and thus does not lose information through 
grouping as does the Chi-square test. Consequently, in 
the continuous case, the Chi-square test is frequently less 
powerful than the Kolmogorov-Smirnov test . 
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It is also known that the Kolmogorov-Smirnov test is 
conservative when the hypothesized distribution function 
is not continuous. In many situations, observations from 
a continuous distribution are grouped. However, studies 
of the modifications of the Kolmogorov-Smirnov test for 
use with grouped data appear to be rather limited. 

The Chi-square test, suggested by Pearson (1900), is 
well suited for use with grouped data, whereas the K-S 
test is for random samples from continuous populations. 

W.J. Conover's procedure was designed to calculate critical 
levels for random samples from discrete distributions. Since 
grouping the data from continuous or discrete populations 
will result in a corresponding underlying distribution 
which is discrete, Conover's procedure can also be used 
for grouped data. The general problem of defining the 
classes or determining the class boundaries in some optimal 
way has apparently received limited attention, In v/hat 
follows we propose a Kolmogorov-Smirnov test for grouped 
data which shows a method of finding (almost) exact 
critical levels and the power of this test. Thus the 
Kolmogorov-Smirnov test may be used as a goodness-of-f it 
test, regardless of whether the hypothesized distribution 
is continuous, or whether the samples are grouped. 

The following listing constitutes the description or 
definition of notation used herein: 
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--s"V |J"j 



i -ul'-- M 



1:1 = 




Notation 



Description 



K-S test 


Kolmogorov-Smirnov test for 
continuous distribution function 
and ungrouped data. 


K-S, . test 
(g) 


Kolmogorov-Smirnov test for 
grouped data. 


Sn 


Empirical distribution function 
of a random sample of size n 


n 


Empirical distribution function 
of a random sample of size n 
which is grouped. 


n 


Sample size. 


a 


Critical level of test 


6 


Critical value associated with 
continuous distribution function 
and ungrouped data. 


6 

g 


Critical value associated with 
grouped data. 


F 

o 


Hypothesized distribution 
function , 


F 

X 


Some population distribution 
function 


b. 

1 


Some fixed real number represent 
ing the class boundary. 


H 

o 


Null hypothesis 


Hi 


Alternative hypothesis 


Pi 


Relative frequency of acceptance 
of null hypothesis. 


^(1) < ^(2) 


<C — — — X 

" (n) Ordered observed random 

sample from distribution F 
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II . GENERAL DESCRIPTION OF THE TEST 



The Kolmogorov-Smirnov test for goodness-of-f it is 
based on the maximiim vertical difference between the 
empirical and the hypothesized cumulative distribution 
function. Under the empirical distribution function 

is expected to be close to the specified distribution 
function, F (•). If the maximum vertical difference 
between the specified and the empirical distribution is 
not small enough for some x, say greater than a critical 
value 6 , this may be considered evidence that the hypothe 
sized distribution is not the one from which the sample 
y/as drawn , 

Suppose F^(*) is the hypothesized distribution and 
S^(‘) is the empirical distribution function, then 



S (x) = - 
n ^ n 



th 



for X between the k — and (k + 1) st 
largest values in the sample and n is 
the sample size. 

The Kolmogorov-Smirnov test is: accept 

the hypothesis X ~ F^(* ) if and only if 



D = Sup 
n ^ 

— CO <x<® 



F (x) - S (x) 
o' ^ n' 



^ 6 



where 



6 is adjusted to give a level a test. This is sometimes 

called the two-sided Kolmogorov-Smirnov statistic as 

opposed to the one-sided statistics D ^ and D , where 

n n ’ 
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D 



n 



+ 



Sup I 



-oo<X<°° 



D 



n 







It is well know that if X,, X, , , X is a random 

sample from a continuous distubition function F (♦). D 
^ o ’ n 

is distribution-free i.e.; has distribution independent 



Let Xj , X2 , — X^ be independent random variables 

with the common cumulative distribution function F (•). 

o' ' 

Furthermore let x. . x, . x, . be the result of 

(1) (2) (n) 

n independent observations, arranged in order size, that 
is, the n order statistics. Suppose the above random 
samples are divided into k groups, given by class boundaries 
X. , i=l, 2 , k, where the b-'s are the outcomes 

J jl J 

(observed values) of certain of the order statistic 

cr 

Let S^(x) be the sample distribution function based on 
the grouped data. Then S^(x) is a step function which 
jumps at class boundaries and so can be written; 



of F ( • )• 
o 






for X < X 



(1) 





1 



for X > X 



(n) 
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For the grouped data, the maximum vertical difference 

between F (x) and S^(x) occurs at one of the class bound- 
o n 

ar'ies. The two-sided Kolmogorov-Smirnov test statistics 



for grouped data (K - statistic) is defined to be; 



D = max 
n 



xe ( X . X . \ 



F (x) - S^(x) 



The ”one-sided K - S. . test'^ statistics are: 

(g) 



D = max 
n 



xe 



{x. X. I 



S^(x) - F^(x) 



D 



n 



= max 



xe |x . 

I J, 






(X) - 



S^(x) 






In what follows, it will be shown that these three 
statictics are distribution free, provided that F^(x) is 
continuous. 

Make the change of variables; 



y = F (X. ) 

Y. = F (X.) 
J j ^ 



, Since F (x. ) is 



nondecreasing, the inequality X. 4 x. is equivalent to 

^ Ji j- 



FoC^i) ^ ^i ^ " 7 T " n 



n' j. 



imes 



(the number of X . .< x . ) = — (the number of Y . y) = H(y) 

.1 ^ J ^ n ^ .1 ^ 



Furthermore, the distribution of Y. is; 

J 



P [Yj ^ y ] = P [x. xj ] = F^(x. ) = y. Thus, 
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Y. has a uniform distribution and D. can be written as 
3 n 



D = max 
n 



xe 



{x.— X. ) 

\ 3 , 



F (x) - S^(x) 
o n^ ^ 



= max 

i=l, 2 k 



F (x. ) - S^(x. ) 
o" 3r j. 



so that. 


F^(x. ) - S?(x. ) 


= 


y - H(y) 




o 3 ± n' j ^ 







and D = max 
n 



xe ( X . X . \ 

\ 31 3 ^_[ 



F (X) - S^(x) 



= max 



I y - H(y) 
yc{y, y^.J 



The expression on the right is the vertical distance between 

the sample distribution of Y^, Y^ Y^ each U (0,1). 

Since this expression does not depend upon F^, is distri- 
bution free. In this case, where the class boundaries are 
order statistic, is distribution free. Massey's procedure (8 ) 
or Davidson’s method (4) could be used to calculate critical 
values for this test. 

In the case where the class boundaries are fixed 

constant, is no longer distribution free. It can be 

described as follows; Suppose the class boundaries are 

fixed constants in advance say b-. The value of F (•) at 

•^1 o 

bjj^ is As before, make the change of "variable" 

y = F (b.). However, since b. is constant, y = F (b.) 
o i ’ 1 o i 

cannot be transformed to Y = F (B.). In other words D = 

o 1 n 
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max .|f^(x) - S^(x) 



xe 



{^1 



max 

i 



F (b. ) - S^(b. ) 
o' i' n' 



=f= max 



y - H(y) 



Therefore, it is suggested that 



Conover's procedure be used to calculate critical levels 



for K - S/ . when the class boundaries are fixed constant, 
(g) 
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III. TEST PROCEDURE . AND CRITICAL LEVEL 



A. TEST-PROCEDURE 

Unlike the Chi-square test and the Kolmogorov- 
Smirnov test, in which critical values are calculated 
which correspond to selected critical levels, with Conover's 
procedure one calculates critical levels which correspond 
to selected critical values, The procedure is as follows. 

Let Xj, Xj X^ be a random sample of size n, drawn from 

some unknown discrete population distribution F (•)• 

X 

The hypothesis is: where Fq( * ) has all 

parameters specified, The alternative is Hj : Fjj(') 

F (•). The test statistic, either D , D or D ~ 
depending on which is desired, is calculated. Further, the 
critical level is computed and if the value of this critical 
level is greater than or equal to that specified in advance 
(usually either 005 or 001), the null hypothesis is accepted; 
otherwise it is rejected. 

Let 6 be some fixed real number, o < 6 < 1. The critical 
value of the test, that is P |^D^ ^ <5 j for the two-sided 
Kolmogorov-Smirnov (discrete) test, can be computed by using 
a procedure due to W.J. Conover (2). Infact this is an • 
approximation method, since P j^D^ 5- 6 j is obtained by 
calculating P ^ D^ ^ ^ [ ^n^ ^ ^ ~ 

(6 , 6^) and 6~, 6^ are the observed values of D^^ and D^^ 

respectively. P f D^ > 6 1 is taken to be approximately 
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approximately P ^ 6j + P :>. 6 J . To see the 

adequacy of this approximation, we proceed as follows; 



p [d„ i ■5] = P [<Dn’" '*>] 

= P ■*] * P [V *] -P [ > ■5) (P„ ’*> ] 




Since P ^ >- 6|d^^ ^ ^1 •? P fP,, ^ ^1 equation (1) 



becomes : 



p[(D^'^ >. 6) (D^~ > 6)j^< 

an approximate value of P 

pfo ^6'!:^prD'^>^6l 

L n J L n J 

of this approximation is P j^(D 

which is less than the product 



P > 6] P 


A\ 

1 


^6 is thus 




+ prD“^6l 


. The 


n" 


1 r 

A^ 


P [d/ >. 6 ] P 


[°n' 






error 



6 



J 



But in practice, 6 is taken so that P ^ j s-nd 

P ^ small and then the product is quite small 

and can be safely ignored. For example, suppose both one- 
sided tests have the same critical level, say, 0.05, then 
P :> 6 j *10 with an error less than (0.05) (0.05) = 

0.0025. 
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B. CRITICAL-LEVEL 

Let 6 = max (F^(x) - S^(x)) where S^(x) represents 
— oo<x<<» 

the empirical cumulative distribution function of sample 
(Fig. 1) 




6”^ is the maximum vertical difference between the 

expected cumulative distribution function and the observed 

cumulative distribution function, Draw horizontal lines 

with intercept 6~+^ where 0 j v< n(l -6 ). Then compute 
J ri 

the value of f . = 1 - (6~ + — ) . For j=0 the line I has 
j ' n'^ o 

intercept 6 . The value of f^ is f^=l-6 . For j=l the 
intercept of is and fj = l-(6~+^) etc. 

If the horizontal lines 1. intersects F (•) at a jump 

J o' ' 

point (at a point x of discontinuity) the value of f . is 1 

d 

minus the ordinate F^(x) at the top of jump. 
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As shown by Conover (2), the critical level 
p[d„- >, 6-] is; 

P 



n(1^6 ) 






(") 

j j j 



(2) 



where 



k^l 



C = 1 - ^ (^) f (3) 

J j=o J J 

k is defined to be the largest value of the subscript j 

such that f. >0, 

J 

Conover's derivation and proof are tedious and are 
omitted here. For example, for k=5 equation (3) becomes; 
^o = 1 

Cl = 1-fo 





C 2 = 


I 

H, 

0 










C 3 = 


1-f ^ 
0 


- - 








C 4 = 


1-f ^ 
0 


- - 


- 6 ^ 2 ^ - 


^^3^3 


To 


calculate P 


D ■^ > 6'^ 
n 


j , where 


6^ = max 1 S^(x) 












— oo<x<co 


draw horizontal 


lines i. 

1 


with intercepts 1 - ( 5 ^ + 


value of 


f . is 
J 


1 - (6'^ + 


i) , but 
n ’ 


if the horizontal 


2 . . 


intersects F 


(•) at a 


jump X, 


then the value of 



n 



the height of F^(x) at the bottom of the jump. For this 
critical value, equation (2) becomes: 



r + +1 n(l^+) 



(4) 



where c. is given in equation (3). 
J 
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Suppose we have a sample of size n grouped into k 

h 

intervals, having m. observation for the i— intervals, 

k ^ 1 

so m.=n. Let b. be a fixed class boundary of the i— 

L=1 ^ ^ 

interval and S^(*) be the empirical cumulative distribution 
function of the grouped sample. S^(*) is a step function 
which rises by jumps of d^ where 0 < d^< 1, 

For example, for i=l, the jump occurs at b^^ with the 
value of d^. The first jump of S^(*) occurs at the first 
class boundary. For i=2, the second class boundary is b 2 
and at this point the value of the step function is (d^ + 

or 

So, the step function S^(*) can be written as: 



S^(x) = < 



0 

j 

i=l 



d. 

1 



for 


X < 


for 


b . ^ X < b 


for 


X > X, . 

(n) 



j+1 



where 






the number of X, . . b . 

( 1 ) . 1 



n 



In using Conover I s procedure for K-S, ,, , let 6 be the 

(g) g 

maximum vertical distance between F (•) and S^(*)i that is 



6 = Max (F (x) - S^(x)), or it can be written 

g ^ o^. 



xe 



:(bi 

(Fig. 2) as; 



6^ = Max (F (b. ) - S^(b. )) 
g ^ o i n i 
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Figure 2 

As before, draw horizontal line and then compute the 
values of f . . The values of C . are calculated by using 
equation (3) and finally the critical level computed by 
equation (4). 

As the sample size is increased, the calculation (if 

done by hand) will be more tetious, so this procedure should 

be used for small sample sizes, say less than 30. Since 

the critical level is a decreasing function of 6^ and 6 , 

g g 

the bigger the value of this maximum vertical difference 

between F ( • ) and S^( • ) the smaller will be the critical 
o n 

level, and vice versa. For illustration the following is 
an example of how to use Conover's procedure. Suppose a 
random sample of size 16 is drawn from some population ^ ‘ 

The observed values are (ordered): 
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10,35 


12 . 04 


14,16 


10.40 


12.53 


14.26 


10.42 


12.63 


14,40 


11.46 


13.18 


14,47 


11.49 


13,33 


14.84 




13.54 




hypothesis is 


: X (10,20) 


with alternative 


X U (10,20) , 


The samples are 


grouped into three 



intervals. The values of the class boundaries choosen 

to be b =12, b„=14 so the three intervals are: (-~ -12) 
1 2 

(12 - 14) and (14 - “). (Fig, 3) 




The values of the step function the first class 

boundary is S^(x) = -jg (the number of < 12) = .3125, at 

1 

the second class boundary is S^(x) = (the number of 
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14) = ,6875, Under the null hypothesis, the theoretical 
rates for these three intervals are = .20, ~ 



= .40 and 



14.84 - 10 



10 



10 



10 = .484 respectively. 

Here the maximum vertical difference is 1 - .484 
= .5160. The with intercept .5160 intersects F^(*) 
at jump point so f^ = 1 - ,6875 = .3125, 2,^ with intercept 
,5160 + ^ = ,5785 also intersects f^(*) at jump point 



o 



so f^ = ,3125 and finally the intercept of is .6410, 

so fj = .3125, The value of k in formula (3) is equal to 
2 since intersects Fq(X) at a jump point with top 
ordinate 1,00, so fj=0 which is not considered, By using 
formula (3) is computed and found to be C^=l, = .6875, 

Cj = .4727. Thus, P >> ,5160 j = (.3125)^® + 

(. 3125) '^®(. 6875) -+ ( ) ( • ^125 ) ^ ■* ( . 4727 ) = .000005, 

Since f’^(x) is symmetric P ^ .516oj is equal to 

the above, so that P ^ ^ ,516oj = 2 x .000005 = .00001. 
Using a = .01 the null hypothesis should be rejected since 
0.00001 < 0.01. Incidentally this is a correct decision 
in this case, since the above random sample was actually 
drawn from a population having distribution U(10, 15). 
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TEST POWER 



IV. COMPARISON AND THE K-S. . 

: Lsl 

The basic difference among the three tests, the Chi- 

square, the K-S and the K-S^^^ is that the Chi-square test 

is sensitive to vertical deviation between the observed 

and expected histogram, the K-S test is based on vertical 

deviation between the observed and expected cumulative 

distribution function, whereas K-S, v is based on vertical 

(g) 

deviation between the observed and expected cumulative 
distribution function associated with discrete groups. 

Another obvious difference is that K-S statistic is distri- 
bution free, whereas for K-S, the statistic is not 

(g ) 

distribution free. 

Both Chi-square and K-S^^^ require that the data be 
grouped; in contrast, the K-S test does not. Therefore 
when the underlying distribution is continuous the K-S test 
permits us to investigate the goodness-of-f it with informa- 
tion from each observation. By contrast, both Chi-square 

and K-S, lose some information since individual obser- 
(g) 

vations are grouped into a relatively small number of classes. 
Further, the Chi-square and K-S^^^ tests are affected by 
the number and the length of the class intervals which are 
choosen arbitrarily by the experimenter. The following is 
another example of applying the K-S^^^ which is followed 
by comparison with Chi-square test. Suppose we have a 
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random sample of size n=15, drawn from some population 
They are (ordered); 



2.01 


3.91 


14.89 


2.08 


5.32 


15.27 


2.24 


9.09 


21,34 


2.52 




29,73 


2.70 




36.54 






39.43 






40.74 



The hypothesis to be tested is that S^(x) has come from 

exponential distribution with X=6 at a=0.05. 

H ; X ^ EXP (6) 
o 

H, : X EXP (6) 

Suppose the sample had been divided into three groups 

associated with the intervals (0, 2,70), (2.70, 9.09) and 

(9.09, oo). Under the null hypothesis the expected grouped 

c.d.f. has ordinates; .3624, .7791 and 1 respectively. The 

observed and expected frequencies for each group are 

5, 3, 7 and 5.436, 6.2505, 3.3135 respectively. 

The maximum vertical difference 6 occurs at x = 9.09 

g 

where 6 = .7791 - .5333 = .2458. The values of f. and C. 

o J J 

are computed as in the previous example and are found to be; 
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^0 




.6376 


= .2209 Cq 


= 


1 


o 

11 


,1585 


% 


= 


.6376 


fg = .2209 C^ 


= 


.3624 


o 

CO 

11 


.1287 


^2 


= 


.2209 


S 


= 


,1314 






^3 


= 


.2209 




= 


.2117 






^4 


= 


.2209 


^4 


= 


.2335 






^5 


= 


.2209 




= 


.2197 






^6 


= 


.2209 




= 


.1914 






With 


those 


values of f . and C . , P 
J J 




'n" ^ *2458] 


is found 


to 


be 


.0395 


. Using a similar procedure it 


can be 


seen that 




D 

n 


> ,2458j = .0,0162, so P 


> .2458 


] 


approxi- 



mately 0.0395 + 0.0162 = 0.0557. The hypothesis is 
accepted at a = 0.05. 

The critical value of the Chi-square statistic on the 
same sample, with three groups and hence 2 degrees of 
freedom, is 5.3936. This is less than 5.99, the 
critical value for a = 0.05. Again the hypothesis would be 
accepted, that is both tests accept that the random sample 
has been obtained from a population having exponential 
distribution with X=6. Using interpolation in tables of 
the incomplete gamma function, the critical value of the 
Chi-square statistic associated with a=0,0557 is found 
to be approximately 5.7724. For a=0,06 this approximate 
value is 5.6275. If we had used a=0.06 rather than 0,05 
the test would have rejected the null hypothesis 

since 0.0557 < 0.06, However the Chi-square test would 
have accepted the null hypothesis since 5.3936 is less 
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than 5,6275. The Chi-square test will reject the null 
hypothesis at any a level which is greater than 0.06, 

In fact, the random sample for the present examples 
was generated from a population having an exponential distri- 
bution with mean 15. Although the critical level a=0.06 
is not commonly used, the Chi-square cannot detect that 
the random sample did not come from exponential distribution 
with mean 6. 

From the example in Chapter III, it was found that the 
null hypothesis was rejected at a=0,01, With the same data, 
with three groups, hence 2 degrees of freedom, under the 
null hypothesis the value of the Chi-square test statistic 
is 13.4075, This is greater than 10.6, the Y/ \ value at 

K2 ) 

a=0.05. Thus, both tests reject the null hypothesis at 
a=0.05. The critical value of the Chi-square test 
associated with a=0, 00001 is found to be approximately 
23.4, which means that if we had used a=0. 00001, the Chi- 
square test would have accepted the null hypothesis, since 
13.4075 < 23.4. Of course, the critical level a=0. 00001 
is rarely used in practice. The acceptance to the null 
hypothesis by the Chi-square test at a=0. 00001 is most 
likely caused by the fact of very low value of the expected 
frequencies in the fourth group of the sample that is 1,344, 

Suppose the null hypothesis is X^U(10, 18) with 

critical level a=0,05. Under this null hypothesis, the 
value of the Chi-square statistic is 5,320, which is less 
than 10.6. In this case, the Chi-square test accepts the 
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null hypothesis, or in other words, the Chi-square test 
accept that the random sample was generated from a 
population having U(10, 18) distribution. Again in this 
case the Chi-square test has low expected frequencies in 
the fourth group namely 1,92. 

For the test, under the null hypothesis 

U(10, 18), the value of 6^ is .395, so P ^ ,395j = 

0.00008, which is less than 0.05. The K-S, . test rejects 
the null hypothesis, which means rejects that the random 
sample was generated from a population having U(10, 18) 
distribution . 

The present examples suggest the may be more 

powerful than the Chi-square test, at least in certain 
cases. Simulations conducted to explore this question are 
discussed below. n 

In the previous example, concerning the exponential 
distribution, the critical value of the K-S statistic in 
the continuous case at a=0.0557 is between .304 and .338, 
which leads us to reject the null hypothesis that the 
sample was generated from an EXP (X=6) distribution. At 
the same critical level, for example 0.0557 or 0.00008, 
grouping samples into intervals tends to lower the power, 
Thus, for grouped data (samples), the appropriate critical 
values are smaller than those tabulated for continuous 
case. Thus, use of tables of critical values of the K-S 
test will not give a correct test. Therefore the need 
for the K-S^g^ procedure is evident. 
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V . SIMULATION 



In order to get a better comparison between the three 
goodness-of-f it tests previously mentioned, especially 
their powers, simulation was used. The powers of the tests 
should be compared under the same conditions, namely at 
the same significance level and for the same null hypothesis. 
There are two procedures for this simulation. The first 
procedure is as follows: generate a random sample of size 

n from a population having distribution F (•)• Specify 
a critical level a, for example and find the critical 

value of the K-S statistic associated with Uj and n. 

By using tables of the incomplete gamma function, one 
can calculate (at least approximately) the critical values 
of the Chi-square statistics associated with a^. Let Cj 
denote this critical value. These three values, a^, 6^, , 

are then used as the input to the simulation for computing 
the relative frequency of acceptance of by the 
the K-S for continuous distribution and the Chi-square tests, 
respectively. Further, generate the random samples N times. 
The hypothesis to be tested is: H^: F (•) = F (•) and 

Hr- =}= F^(*) at level Cj. 

A /S A 

Let Pj , Pj , P3 denote the relative frequency of 
acceptance of by the K-S, the K-S^^^ and the Chi-square 
respectively, then 
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/s 



P2 

A 

P3 



"number of times" (D ^ 6 ) 

' n , 1 ' 

N 

"number of times" P Td ^6, 1 ^ a, 

L n 1 J ^ 1 

N 

"number of times" (x^/ \ Statistic 4 C ) 

( r ; 

N 



Both acceptances of K-S and Chi-square test can be programmed. 
However, it appears to be difficult to program Conover's 
procedure. To the knowledge of the author, a subroutine 
for this procedure is not available. 

Therefore the simulation has been done indirectly, 
using a second procedure, This can be described as follows: 
With the same random samples from the same F^( • ) and the 
same number of groups then for 0 < dj dj < 1, 

P ^ djj ^ P ^ ^ djj . If the hypothesis is accepted 

at P ^ ^2] ~ *^2’ it will be accepted at 

P ^ ] “ “1’ since o.j ^ other words the sample 

rates of acceptance of K-S, . is defined to be 
^ (g) 

"number of times" (D 4 d) 

where d is specified in 

advance. Once d is fixed,. P ^ d j = aj can be computed 

from which the critical values for the other two tests 
associated with can be calculated. However, it will be 
difficult to obtain an approximate value for the critical 
values of the K-S test because of the limited a values in 
the tables. Therefore, d is used to calculate the frequency 
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of acceptance by both K-S tests. So, the inputs for the 
second procedure are Cj and d. 

For the purpose of this simulation, groups of 15 random 
numbers v/ere generated from Exponential distributions with 
means X=6, 9, 12 and 15; random samples of size 14 were 
generated from Normal distributions with y=0, a=l, 3 and 
5; random samples of size 16 were generated from Uniform 
distributions U(10, 13), U(10, 15) and U(10, 17). The 
value of d was choosen arbitraryly to be d=,3. The results 
of this simulation are summarized in table (5.1). 

The null hypothesis for the four Exponential cases was 
= EXP (X=9), with 4 groups given by the intervals (0 -3), 
(3-6), (6-9) and (9 - 100). By Conover's procedure 
P d=.3j = a® = 0.0328, The null hypothesis for the 

three Normal cases was N(0, 1). Four groups were used 
with intervals (-®°, -1), (-1, 0), (0, 1) and (1, «>) and 
P > d=.3 j = was found to be 0.0185. Finally, the 

null hypothesis for the Uniform cases was U910, 15), with 
5 groups given by the intervals (10, 11), (11,. 12), (12, 13), 
(13, 14) , (14, 17) and P > d=.3 j = was found to 

be 0.022. The associated critical values for the Chi^ 
square test are approximately 8.7517, 8.00, 13.245 in the 
EXPONENTIAL, NORMAL and UNIFORM cases, respectively. These 
three critical values together with the three a levels are 
then used in the input to the simulation. 

Simulation results give p^ =97,97% for the acceptance 
frequency of the test for the null F^ = U(10, 15) 
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has been adjusted from 



distribution, which mean (1,00 - .9797) = 0,0203 is the 
estimated rejection rate, in other words aj=0.0203. The 
input Cj for the Uniform random cases is 0.022; the differ- 
ence 0.0017 between these is not significant, From table 
(5.1) it can be seen that the corresponding difference for 
the Exponential case is 0.0028 and for the Normal case is 
0.0008, which are not significant. From the same table it 
can be seen that the significance level (for the null 
hypothesis) for both the Chi-square and tests in 

the Uniform and Exponential cases, are very close. 

For example, in the Exponential case, this difference 
is only (.0356 - .0316) = ,004. Therefore these tests can 
be assumed to be at the same a level. In the Normal case 
by grouping the data, the value of aj=0.0177. This gives 
a difference of 0.0008 compared with the input aj=0,0185, 
However, the critical value of the Chi-square statistic 
associated with ttj=0.0185, and degrees of freedom 3 is 8,0 
which gives .9565 for the acceptance rate under null 
hypothesis. In other words the a value here is 0,0435, 
which is not sufficiently close to 0.0185 since the differ- 
ence is 0.0250. To get a closer value, the critical value 
of 8.0 was adjusted to 10.50 which gave aj=0.0126. This 
adjustment causes the other two acceptance rates for N(0, 9) 
and N(0, 25), to be larger. 

With this adjustment all the tests. Chi-square, K-S 
and are at (nearly) the same a level, so comparison 

of their powers are easily performed. 
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For the U(10, 17) case the power of the ChiT-square 

and the are 0,2208 and 0,2426 and for U(10, 13) both 

are equal to 1.00. It means that in the U(10, 17) case 

the test appears to be more powerful than the Chi- 

square test, whereas in the U(10, 13) case both tests have 

the same power. For the Exponential case with parameter 

6, 12 and 15 the power of the K-S, test turns out to 

(g) 

be more powerful than the Chi-square test in these 
Exponential and Normal cases. Looking at the results of 
another simulation, using three samples, generated from 
U(0, 18), U(0, 20) and U(0, 22) distributions, where the 
null hypothesis to be tested was the Exponential distribu- 
tion with mean X=9, which are also tabulated in table (5,1), 
the K-S^g^ appears to be more powerful than the Chi-square 
test . 

For the Normal case, the Chi-square test turns out to 
be more powerful than the K-S^^^, even though the expected 
frequencies in the first and fourth groups were very low, 
only 2.220 and 2.218. Of course, the Chi-square distribu- 
tion in these applications is only an approximation. This 
approximation gives better result for larger samples. It 
is a rule of thumb that the approximation can be used with 
confidence as long as every expected frequency is at least 
equal to 5. This rule should not be considered inflexible 
however. It appears to be a conservative value and the 
Chi-square approximation was reasonably accurate in this 
study, even for expected cell frequencies as small as 1.5 
( 6 ). 
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Based on the results of the simulation for Exponential 
cases where two of the four groups have low expected 
frequencies (namely 3.0315 and 2.2350), the approximation 
appears reasonably accurate. The estimated value of 
obtained from this simulation is aj=0.0316 which is close 
to the "target,” aj=0.0328. By contrast, for the Normal 
case, the approximation is not so accurate. Thus, it was 
adjusted to get a closer value, as mentioned before. This 
inaccurate approximation may be caused by the fact of small 
expected frequencies in the tail. 

Table (5,2) shows results of yet another simulation. 

By keeping the variance constant and letting the means vary, 
again the test appears to be more powerful than 

the Ch.i-square test. 

In all cases, without grouping the data, in other words, 

in the continuous case, the ordinary K-S test is the most 

powerful test. Existing tables for the K-S distribution 

indicate the critical value 6=.3 corresponds to an a level 

somewhere between .10 and .15. To get the same, or at 

least close to the critical level used for the K-S, . , for 

(g) 

example 0.0328 for the Exponential case, d=.3 should be 
increased. In other words, the value of 6 for continuous 
case would be greater. Again, grouping the data and 
holding the same a level requires the critical value of the 
K-S test to be smaller than that tabulated. 
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VI , SUMMARY 



1. Based on the results of the simulations, grouping the 
data causes the K-S statistic to be stochastically 
smaller than that for the ordinary tabulated K-S 
statistic. For continuous underlying distribution 
functions the K-S test is the most powerful of the 
three test considered. 

2. It is hard to draw a general conclusion related to 

the relative powers of the K-S^^^ and Chi-square 

tests. In some cases the K-S. , test is more powerful 

(g) 

than the Chi-square test; in other cases the reverse 
is true. It is suggested tha.t the rela.tive powers be 
investigated in more detail. Further investigation is 
also needed to determine rules of thumb for the 
appropriate number of groups to be used. 

3. For small samples, the ordinary table of the K-S 
statistic cannot be used for grouped data. The extension 
of the K-S test suggested herein, should be used when 
the data have been grouped. Unfortunately, for sample 
sizes larger than 30 the calculation becomes tedious. 

4. It would be very worthwhile to program a subroutine of 
Conover's procedure. The availability of the subroutine 
would enable this test to be used without time consuming 
calculations, and also possibly make the test available 
for larger sample sizes. 
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5. It is suggested that the possibility of developing a 
"quick and dirty" modification of the K-S table for use 
in discrete and grouped cases, be investigated. 

6. This simulation also demonstrated the adequacy of 
approximation, even when expected cell frequencies 
fell substantially below 5. 

7. Care should be taken in computing , the maximum 

vertical distance between F (x) and S^(x). This 

o' n ' 

maximum distance is equal to the greatest value of 

F (b.) - S^Cb.) where b. ranges over the set of 

o i n'^i 1 

class boundaries. 
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